Web Usage mining framework for Data Cleaning and IP address Identification

نویسندگان

  • Priyanka Verma
  • Nishtha Kesswani
چکیده

The World Wide Web is the most wide known information source that is easily available and searchable. It consists of billions of interconnected documents Web pages are authored by millions of people. Accesses made by various users to pages are recorded inside web logs. These log files exist in various formats. Because of increase in usage of web, size of web log files is increasing at a much faster rate. Web mining is application of data mining technique to these log files. It can be of three types Web usage mining, Web structure mining and Web content mining. Web Usage mining is mining of usage patterns of users which can then be used to personalize web sites and create attractive web sites. It consists of three main phases: Preprocessing, Pattern discovery and Pattern analysis. In this paper we focus on Data cleaning and IP Address identification stages of preprocessing. Methodology has been proposed for both the stages. At the end conclusion is made about number of users left after IP address identification. Keywords—Web usage; preprocessing;IP Address identification

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Preprocessing Methods for Web Usage Data

World Wide Web is a huge repository of web pages and links. It provides abundance of information for the Internet users. The growth of web is tremendous as approximately one million pages are added daily. Users’ accesses are recorded in web logs. Because of the tremendous usage of web, the web log files are growing at a faster rate and the size is becoming huge. Web data mining is the applicati...

متن کامل

A Novel Semantically-Time-Referrer based Approach of Web Usage Mining for Improved Sessionization in Pre-Processing of Web Log

Web usage mining(WUM) , also known as Web Log Mining is the application of Data Mining techniques, which are applied on large volume of data to extract useful and interesting user behaviour patterns from web logs, in order to improve web based applications. This paper aims to improve the data discovery by mining the usage data from log files. In this paper the work is done in three phases. Firs...

متن کامل

An Efficient Algorithm for Data Cleaning of Web Logs with Spider Navigation Removal

The World Wide Web is growing massively larger with the exponential growth of websites providing the user with heaps of information. Text files called as web logs are used to store the clicks of a user whenever a user visits a website. Web usage mining is a stream of web mining that involves the applications of mining techniques to be applied on the server logs containing the user clickstreams....

متن کامل

A Novel Technique for Path Completion in Web Usage Mining

World Wide Web is a huge repository of web pages and links. The Web mining field encompasses a wide array of issues, primarily aimed at deriving actionable knowledge from the Web, and includes researchers from information retrieval, database technologies, and artificial intelligence. The growth of web is tremendous as approximately one million pages are added daily. Users’ accesses are recorded...

متن کامل

A Survey of Preprocessing Method for Web Usage Mining Process

The amount of web applications are increasing in large amount and users of web applications are also increasing rapidly with high speed. By increasing number of users the size of log file also increases .The information which stores in log files cannot be directly used for analysis. Therefore preprocessing of log files is necessary to improve the quality of web usage mining process. Preprocessi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1408.5460  شماره 

صفحات  -

تاریخ انتشار 2014